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Abstract 

We  present  a  certified  reduced  basis  method  for  high-fidelity  real-time  solution  of  parametrized  partial  differ¬ 
ential  equations  on  deployed  platforms.  Applications  include  in  situ  parameter  estimation,  adaptive  design  and 
control,  interactive  synthesis  and  visualization,  and  individuated  product  specification.  We  emphasize  a  new  hier¬ 
archical  architecture  particularly  well  suited  to  the  reduced  basis  computational  paradigm:  the  expensive  Offline 
stage  is  conducted  pre-deployment  on  a  parallel  supercomputer  (in  our  examples,  the  TeraGrid  machine  Ranger); 
the  inexpensive  Online  stage  is  conducted  “in  the  field”  on  ubiquitous  thin/inexpensive  platforms  such  as  laptops, 
tablets,  smartphones  (in  our  examples,  the  Nexus  One  Android-based  phone),  or  embedded  chips.  We  illustrate 
our  approach  with  three  examples:  a  two-dimensional  Helmholtz  acoustics  “horn"  problem;  a  three-dimensional 
transient  heat  conduction  “Swiss  Cheese”  problem;  and  a  three-dimensional  unsteady  incompressible  Navier- 
Stokes  low-Reynolds-number  “eddy-promoter”  problem. 
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1.  Introduction 

Many  engineering  applications  require  high-fidelity 
real-time  simulation  on  deployed  platforms  “in  the 
field.”  Examples  include  in  situ  parameter  estimation 
and  identification  procedures,  embedded  adaptive  de¬ 
sign  and  control  systems,  virtual  reality/synthesis  and 
visualization  environments  (from  music  to  medicine), 
and  individuated  context-dependent  product  specifica¬ 
tion  frameworks.  In  all  these  cases  the  mathematical 
model  must  be  sophisticated,  the  numerical  approxi¬ 
mation  must  be  accurate,  and  the  response  to  a  query 
must  be  rapid  —  commensurate  with  real-time  deci¬ 
sion  or  interaction  requirements  —  despite  the  limited 
processor  power  and  storage  capacity  available  in  the 
field.  We  shall  furthermore  be  interested  in  both  input- 
output  evaluation  and  visualization;  the  latter  places 
additional  demands  on  memory. 

We  shall  suppose  that  the  system  input  p  e  D  clp 
enters  as  a  parameter  in  a  partial  differential  equation 
(PDE)  which  describes  the  relevant  physical  phenom¬ 
ena  over  the  time  interval  of  interest  0  <  t  <  tf  and  the 
appropriate  spatial  domain  Q  c  W1 ,  d  =  2  or  3.  This 
PDE,  say  a  linear-time-invariant  (LTI)  parabolic  equa¬ 
tion,  yields  (i)  a  field  variable  over  SI,  u(t\p)  e  X(S1) 
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(where  X(Q)  is  an  appropriate  function  space),  and  (ii) 
a  scalar  output  of  interest,  s(t,  p)  e  K,  which  can  be  ex¬ 
pressed  as  a  (say)  linear  functional  of  the  field  variable, 
s(t;  p)  =  t(u(t\ p)).  (In  actual  practice  we  may  consider 
many  outputs.)  Note  that  the  parameter  dependence 
proceeds  from  the  PDE  through  the  field  variable  and 
finally  to  the  engineering  output. 

We  shall  distinguish  between  the  pre-deployment 
period  and  the  post-deployment  or  equivalently  de¬ 
ployed  period.  The  pre -deployment  period  takes  place 
in  the  laboratory:  we  prepare  the  system  and  associ¬ 
ated  computational  model  for  subsequent  service.  The 
deployed  period  takes  place  in  the  field:  we  put  the 
system  and  associated  computational  model  —  now 
implemented  on  an  embedded  or  more  generally  “de¬ 
ployed  platform”  —  into  service.  In  the  deployed 
stage  the  computational  task  is  well-prescribed:  given 
a  query  instance  p'  e  D  we  wish  to  (a)  predict  the  out¬ 
put,  p'  e  D  — >  (u(tk;  p')  — >)  s(tk\ p'),  0  <  k  <  K,  and 
( b )  visualize  the  field,  p'  e  D  — >  u(tk;  p')\k,0  <  k  <  K\ 
here  R  c  SI  is  a  region  or  manifold  selected  for  render¬ 
ing.  (We  reserve  p'  to  denote  a  query  instance  -  a  re¬ 
quest  post-deployment.)  Note  the  field  variable  plays 
an  important  role  both  in  input-output  evaluation  and 
of  course  in  visualization. 

To  perform  this  computational  task  the  PDE  is  typi¬ 
cally  discretized  by  a  finite  difference  discretization  in 
time  and  a  finite  element  (FE)  discretization  in  space. 
In  time  we  consider  a  (say)  Crank-Nicolson  scheme 
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associated  to  time  levels  tk  —  kAt ,  0  <  k  <  K.  where 
At  —  tf/K;  in  space  we  consider  Galerkin  projec¬ 
tion  over  a  FE  approximation  subspace  XN  (c  X) 
of  large  dimension  N.  Our  “truth”  approximation  is 
then  given,  for  any  p  e  2),  by  uN (tk; p),  sN (tk\ p)  = 
t(uN{tk\p)),  0  <  k  <  K.  We  note  that,  given  our 
restriction  to  p  e  2),  all  solutions  of  interest  per¬ 
force  reside  on  the  parametrically  induced  manifold 
Mn  =  { uN(tk ;  p)  \  0  <  k  <  K,  p  e  £>}.  We  observe 
that  this  manifold  is  relatively  low-dimensional;  we 
can  further  anticipate,  and  in  many  cases  demonstrate, 
that  this  manifold  is  smooth. 

Our  truth  approximation  shall  provide,  for  suffi¬ 
ciently  small  At  and  in  particular  for  sufficiently  large 
N,  the  desired  accuracy.  However,  we  cannot  expect 
real-time  response  in  particular  on  deployed  platforms 
typically  characterized  by  limited  processor  power  and 
memory  capacity.  We  thus  pursue  the  certified  reduced 
basis  (RB)  approach  [1,  2,  3,  4,  5]  as  an  approxima¬ 
tion  to  the  truth  approximation.  In  time  we  directly  in¬ 
herit  the  Crank-Nicolson  discretization  of  the  truth;  in 
space  we  consider  Galerkin  projection  over  an  RB  ap¬ 
proximation  space  XN  (c  XN  )  of  small  dimension  N . 
Our  RB  approximation  is  then  given,  for  any  p  e  2), 
by  u^(tk\ p),  SN(tk\ p)  =  £(uN(tk',  p)),  0  <  k  <  K.  We 
also  provide  rigorous  a  posteriori  bounds,  A^(tk;p) 
and  A'N(tk;  p),  for  the  error  in  the  RB  field  approxima¬ 
tion  and  the  RB  output  approximation,  respectively: 
for  any  p  e  2),  \\uN(tk;p)  -  uN(tk\p)\\x  <  A N(tk',p), 
| sN(tk',p)  -  sN(tk'.p) |  <  AsN(tk',p).0  <  k  <  K.  We  may 
thus  say  that  our  RB  approximation  is  certified. 

The  RB  approximation  space  XN  is  specifically  de¬ 
signed  to  well  approximate  functions  which  reside  on 
the  parametrically  induced  manifold  of  interest,  X\N '. 
indeed,  XN  is  developed  as  the  span  of  optimally  se¬ 
lected  (and  combined)  snapshots  on  the  manifold  ATV. 
(In  contrast,  even  in  a  mesh-adaptive  context,  the  truth 
FE  approximation  space  XN  can  represent  a  large  class 
of  functions  very  distant  from  Afv . )  We  can  thus  ex¬ 
pect  (V  «:  N.  The  latter  may  in  certain  simple  in¬ 
stances  be  proven,  may  in  general  be  confirmed  a  pos¬ 
teriori  through  our  error  bounds,  and  may  in  practice 
be  observed  in  a  wide  variety  of  problems.  (Of  course 
the  “constants”  will  certainly  depend  on  the  particu¬ 
lar  problem  under  study  and  especially  on  the  num¬ 
ber  of  parameters,  P.)  This  reduction  in  dimension 
in  conjunction  with  an  Offline-Online  computational 
approach  provides  the  RB  advantage  in  the  real-time 
deployed  context.  We  now  discuss  the  Offline-Online 
decomposition. 

In  the  Offline  stage  we  develop  the  RB  space;  we 
identify  optimal  (combinations  of)  snapshots  on  Al^ 
and  we  “precompute”  various  parameter-independent 
functionals  of  these  snapshots  implicated  in  subse¬ 
quent  RB  approximations  and  associated  RB  error 
bounds;  this  Offline  stage  is  expensive  —  0(Ny) 
FLOPs,  where  y  is  a  problem-dependent  factor  re¬ 


lated  to  the  computational  cost  of  the  truth  solves. 
The  Offline  stage  yields  a  (problem-dependent)  Online 
Dataset;  this  dataset  is  small  —  0(  Q,  N)  data,  where  Q 
measures  the  parametric  complexity  of  our  PDE.  In  the 
Online  stage  we  invoke  the  Online  Dataset  to  perform 
rapid  certified  output  evaluation:  given  any  p'  e  D 
we  calculate  the  RB  output  approximation  and  asso¬ 
ciated  RB  output  error  bound,  respectively  SN(tk;p') 
and  AsN(tk\p'),  0  <  k  <  K.  This  Online  stage  is  very 
inexpensive  —  0(Q,N.K)  FLOPs  with  (V  <sc  N.  (In 
the  next  section  we  also  discuss  Online  certified  visu¬ 
alization;  in  this  case  the  Online  Dataset  and  Online 
operation  count  will  depend  on  N  and  in  particular  on 
the  number  of  FE  degrees  of  freedom  associated  with 
R.  We  note,  however,  that  the  visualization  is  a  useful 
but  optional  step:  the  key  quantities  are  the  output(s) 
and  associated  error  bound(s).) 

We  now  associate  the  Offline  stage  to  the  pre¬ 
deployment  period  and  the  Online  stage  to  the  post¬ 
deployment  period.  The  expensive  Offline  stage  is  con¬ 
ducted  prior  to  deployment  and  hence  the  considerable 
Offline  cost  is  not  our  principal  concern.  (Of  course, 
control  of  the  Offline  cost  is,  in  practice,  very  impor¬ 
tant;  we  discuss  this  further  in  the  next  section.)  Only 
the  inexpensive  Online  stage  is  invoked  in  the  deployed 
period  and  hence  only  the  very  low  Online  cost  will 
determine  our  primary  performance  metric  —  reliable 
and  rapid  response  in  the  field.  We  may  thus  achieve 
our  objective  of  high-fidelity  real-time  simulation  on 
deployed  platforms,  as  we  now  describe. 

In  the  Online  stage,  the  response  to  each  query  in¬ 
stance,  p'  e  2)  — ■»  sjv(f *•';//),  A ^(fA’;//),0  <  k  < 
K,  requires  sufficiently  few  operations  —  0(Q,  N.  K ) 
FLOPs,  independent  of  AC  —  and  sufficiently  little 
data  —  0(Q,N)  storage  for  the  Online  Dataset,  in¬ 
dependent  of  N  —  to  achieve  real-time  response  on 
deployed  (“thin”)  platforms.  Furthermore,  our  rigor¬ 
ous  error  bound  A sN(tk',  p'),  0  <  k  <  K.  will  guarantee 
the  accuracy  of  the  RB  output  prediction  relative  to  the 
high-fidelity  truth.  (We  emphasize  that  the  error  bound 
does  not  require  appeal  to  uN{tk\p'),sN{tk\p').)  We 
thus  obtain  not  just  rapid,  but  also  accurate,  optimal, 
and  safe  decisions  in  the  field. 

In  Section  2  we  describe,  for  a  simple  model  prob¬ 
lem,  the  reduced  basis  approach.  We  emphasize  the 
computational  aspects:  the  RB  approximation  and  as¬ 
sociated  RB  a  posteriori  error  estimation  “kernels”; 
the  procedure  for  identification  of  optimal  RB  ap¬ 
proximation  spaces;  and  the  Offline,  Online  Dataset, 
and  Online  decomposition.  In  Section  3  we  elabo¬ 
rate  upon  the  Offline  and  Online  procedures  within 
a  hierarchical  architecture:  we  present  the  Offline 
procedure  from  a  parallel  perspective  and  describe  a 
particular  implementation  on  the  TeraGrid  supercom¬ 
puter  Ranger  at  the  Texas  Advanced  Computing  Cen¬ 
ter  (TACC);  we  present  the  Online  procedure  from  a 
deployed/embedded  perspective  and  describe  a  partic- 
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ular  implementation  on  a  Nexus  One  Android  phone 
“model  platform.”  In  Section  4  we  present  results  for 
three  examples:  a  frequency-domain  acoustics  prob¬ 
lem  in  a  two-dimensional  horn  configuration  O  — 
to  illustrate  the  necessity  of  high-fidelity  PDE  mod¬ 
els  and  accurate  numerical  solutions;  a  transient  lin¬ 
ear  heat  conduction  problem  in  a  three-dimensional 
“Swiss  Cheese”  configuration  £1  —  to  illustrate  treat¬ 
ment  of  many  parameters;  and  a  transient  incompress¬ 
ible  fluid  flow  problem  in  a  three-dimensional  sphere- 
in-duct  configuration  £1  —  to  illustrate  extension  to 
(quadratic)  nonlinearities. 

2.  Certified  Reduced  Basis  Formulation 

2.1.  Model  Problem 

We  shall  illustrate  the  approach  for  a  very  simple 
model  problem.  We  consider  steady  heat  conduction 
in  a  (say,  polygonal)  domain  £1  =  Q  U  Q2:  the  nor¬ 
malized  thermal  conductivity  in  Q  (respectively.  Or) 
is  unity  (respectively,  k).  We  apply  a  uniform  unit  heat 
source  over  the  entire  domain  Q.  We  require  that  the 
temperature  field,  u,  vanish  —  zero  Dirichlet  condi¬ 
tions  —  on  the  domain  boundary  d£l.  We  consider  a 
single  (P  =  1)  parameter:  //  =  k.  the  conductivity  in 
O2 ;  D ,  the  parameter  domain,  is  given  by  (say)  the  in¬ 
terval  [1, 10].  We  take  for  our  output  of  interest,  s,  the 
integral  of  the  temperature  over  Q  .  (Note  we  may,  for 
example,  expand  the  model  to  include  convection  by  a 
prescribed  incompressible  velocity  field.  Many  other 
extensions  are  possible.) 

In  mathematical  terms,  u(p)  e  X,  where  X  =  //,'(©); 
here  //f'(Q)  =  {v  €  /^(Q)  |  v|dfi  =  0],  //'(D)  =  {v  e 
L2(Q)|  Vv  e  (L2(0))2),  and  L2(Cl)  is  the  space  of 
square  integrable  functions  over  £1.  We  associate  to 
the  space  X  the  inner  product  (vv,  v)x  =  fn  Vvv  •  Vv  and 
induced  norm  ||w||x  =  v/(w,w)x  and  to  the  space  Lr( O) 
the  inner  product  (vv,  v)  =  fn  vvv  and  induced  norm 
||w||  =  V(w,  vv).  We  then  define  the  continuous  and  co¬ 
ercive  bilinear  forms  a1  =  fQl  Vu’-Vv,  a2  s  fQ2  Vw-Vv, 
and 

a(w,  v;  p)  =  a1  (w,  v)  +  pa2(w,  v),  Vvv,  v  e  X,  (1) 

and  the  bounded  linear  forms  /(v)  =  v  and  £(v)  = 
fQI  v.  We  can  now  provide  the  weak  statement  of 
our  PDE:  given  p  e  D.  find  u(p)  e  X  such  that 
a(u(p),v,p)  =  f{y),  Vv  e  X\  evaluate  s(jj)  =  £(u(p)). 
(We  may  readily  accommodate  several  or  even  many 
outputs.) 

We  note  that  (1)  is  a  special  case  of  a  more  general 
hypothesis.  We  say  that  our  bilinear  form  a  is  “affine 
in  parameter”  (or  more  precisely,  “affine  in  functions 
of  the  parameter”)  if  we  can  write 

Q 

a(w,  v;  p)  =  J]®q(p)aq(w,v),  (2) 

q=l 


where  the  ®q  :  D  — >  M.  are  easily  evaluated 
(0(1)  FLOPs)  parameter-dependent  coefficient  func¬ 
tions  and  the  aq  are  parameter-independent  continuous 
bilinear  forms.  Similar  expansions  may  be  developed 
for  /  and  (.  The  assumption  (2)  is  a  prerequisite  for 
the  crucial  Offline-Online  decomposition  discussed  in 
greater  detail  below. 

In  fact  many  problems  admit  a  representation  of  the 
form  (2)  which  can  either  be  identified  by  inspection 
or,  in  the  case  of  certain  geometric  variations,  be  con¬ 
structed  in  an  automated  fashion  [5].  (Note  that  the 
RB  approximation  is  developed  on  a  fixed  reference 
domain  Q;  geometric  variations  thus  appear  as  coeffi¬ 
cient  functions  which  arise  from  the  transformation  of 
the  actual  parameter-dependent  domain  Q °(ji)  to  £1.) 
More  generally,  we  can  develop  an  affine  representa¬ 
tion  which  approximates  the  bilinear  form  a  [6];  in 
certain  cases  we  can  further  develop  rigorous  bounds 
to  quantify  the  additional  “consistency”  errors  intro¬ 
duced  [7], 

We  next  define  the  truth  finite  element  (FE)  approx¬ 
imation.  We  introduce  a  triangulation  of  O,  Tv,  to 
which  we  associate  a  standard  first-order  conforming 
polynomial  FE  approximation  space  XN .  We  can  then 
provide  the  truth  approximation:  given  p  e  2),  find 
uN(p)  €  XN  such  that  a(uN(p),  v;  p)  =  /(v),  Vv  e  XN\ 
evaluate  sN(p)  =  £(uN  (p)).  For  future  reference  we 
denote  by  *V  c  {1, . . . ,  N]  the  set  of  vertices  of  TN 
associated  to  a  region  or  manifold  Sell  over  which 
we  wish  to  visualize  the  field  uN (p). 

2.2.  Reduced  Basis  Kernels 

We  consider  the  primal -only  reduced  basis  (RB)  for¬ 
mulation.  (In  actual  practice  we  often  pursue  primal- 
dual  RB  approximations,  as  described  in  detail  in  Sec¬ 
tion  11  of  [5]:  primal-dual  approaches  can  be  signif¬ 
icantly  more  efficient  both  in  the  Offline  and  Online 
stages.)  We  shall  assume  that  we  are  given  Nmax  nested 
RB  approximation  spaces  A)  c  AA  c  •  •  ■  XN  •  •  •  c 
A,vmai ;  here  AT  is  of  dimension  N.  The  RB  approx¬ 
imation  ux(p)  e  Xn  is  expressed  in  terms  of  ( ,  )x- 
orthonormal  basis  functions  {T)},=  i,...,,vm„: 

N 

uN(x\  p)  =  ^  U)N  j(p)£j(x),  (3) 

i=  i 

where  x  is  a  point  in  O.  The  e  XN  —  orthonormal- 
ized  snapshots  from  —  are  in  practice  piecewise- 
linear  functions  over  O  defined  by  the  nodal  values 
£j(x ;),  1  <  i  <  N,  1  <  j  <  N,  where  x,-  denotes  a 
vertex  of  the  triangulation  TN .  (Note  that  for  visual¬ 
ization  purposes  un(p) k  e  XN  may  be  reconstructed 
solely  in  terms  of  ^;(x,),  i  e  'V .  1  <  j  <  N .)  At  the 
conclusion  of  this  section  we  shall  be  in  a  position  to 
succinctly  summarize  the  algorithm  by  which  these  RB 
spaces  —  in  particular,  the  underlying  snapshots  —  are 
optimally  selected. 
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We  can  now  provide  the  RB  Galerkin  approxima¬ 
tion:  given  p  e  D,  find  u^(p)  e  X ^  such  that 
a(uxi(p),  v;  p)  =  /(v),  Vv  e  XN\  evaluate  sn(p)  = 
£(un(p)).  We  know  by  Cea’s  Lemma  that  this  approx¬ 
imation  is  optimal  (in  the  energy  norm):  the  Galerkin 
projection  chooses  the  best  linear  combination  of  snap¬ 
shots.  We  can  further  develop  the  corresponding  ma¬ 
trix  equations  (inserting  (3)  into  our  weak  form  and 
testing  on  v  =  1  <  i  <  N):  given  p  e  D, 

find  coN(p)  e  Rw,  the  solution  to  AN(p)ojN(p)  =  FN, 
and  evaluate  s^ip)  -  LTa>N(p).  Here  ANij(p)  = 
1  <  i,j  <  N ,  is  our  stiffness  matrix,  F  N  t  = 
/(£■),  1  <  i  <  N,  is  our  load  vector,  and  LNl  =  £(£,),  1  < 
i  <  N,  is  our  output  vector.  Note  that  AN,  F N,  and  LN 
are  principal  submatrices/subvectors  of  ANmm,  FNmm, 
and  Ln„,„  ,  respectively,  thanks  to  our  hierarchical  RB 
spaces. 

We  could  simply  calculate,  for  any  given  query  in¬ 
stance  //,  the  matrix  elements  A^ij(p')  =  «(£/,  p' ), 
1  <  i,  j  <  N,  as  N2  integrations  over  Q.  However  this 
approach  is  very  expensive  —  0(NN 2)  —  and  indeed 
perhaps  even  more  expensive  than  direct  calculation 
of  the  truth  FE  approximation  uN(p'),  sN(p’).  Instead, 
and  in  anticipation  of  our  Offline-Online  strategy,  we 
must  partition  the  computation  of  o>n(jj')  and  s^ip') 
into  two  complementary  components:  a  “construction” 
part  which  will  be  expensive  —  operation  count  O(N) 

—  but  which  will  not  depend  on  the  query  instance  p' 
and  hence  may  be  performed  in  the  pre-deployment 
period;  an  “evaluation"  part  which  will  depend  on  the 
query  instance  p'  but  which  will  be  inexpensive  —  op¬ 
eration  count  O(N),  not  0(N )  —  and  hence  can  be  ac¬ 
commodated  in  the  deployment  period. 

The  key  enabler  is  the  “affine  in  parameter”  struc¬ 
ture  of  the  bilinear  form:  (1)  implies  that  a((j,  £,■;  p)  - 
al(£j,£i)  +  pa2(£j,Q),  1  <  i,  j  <  N,  and  hence  that 
An(jj)  =  AlN  +  hA2n,  where  AlNj  j  =  al( £,-,£)  and 
A2Nij  =  a2(£j,£j),  1  <  i,j  <  N,  are  parameter- 
independent  “proto”-stiffness  matrices.  (In  the  oper¬ 
ation  count  and  storage  estimates  developed  below  we 
shall  extend  this  argument  to  the  general  affine  expan¬ 
sions  described  by  (2).  In  this  case  we  will  now  have 
Q  proto-stiffness  matrices.) 

The  necessary  construction-evaluation  partition  is 
now  readily  identified.  In  the  construction  part  we 
compute  (and  store)  the  elements  of  AjL  and  A2N  —  in 
0(NN2)  FLOPs.  In  the  evaluation  part,  for  a  given 
query  instance  //,  we  first  form  A^{p')  -  AjL  +  p'A2N 
from  (the  stored)  Aj^  and  A2N  —  in  0(N2)  FLOPs;  we 
then  solve  the  small  system  An(jj')u>n(p')  =  F n  —  in 
0(N3)  FLOPs;  finally  we  calculate  sN(p’)  -  Ltnlj n(m') 

—  in  0(N )  FLOPs.  (Note  that  AN(p)  is  small  but,  un¬ 
fortunately,  full.)  The  operation  count  for  the  evalua¬ 
tion  part  is  independent  of  the  truth  resolution,  N . 

We  next  provide,  now  given  un(jj)  (equivalently 
ojn(p)),  the  RB  a  posteriori  error  estimators.  We  first 


define  the  residual  as  r(v;  p)  =  /(v)  -  aiu^ip),  v;  p), 
Vv  e  XN .  The  Riesz  representation  of  the  residual 
Rip)  e  XN  is  given  by 

(R(p),v)x  =  r(v),  Vv  G  XN.  (4) 

We  may  then  introduce  our  error  bounds  A^ip)  =  ||R|Ly 
and  A sN(p)  s  Q||R||x,  where  C{  is  a  calculable  con¬ 
stant  (the  dual  norm  of  ().  Note  that  the  A- norm  error 
bound  Ajv ip)  has  the  anticipated  form:  the  dual  norm 
of  the  residual  divided  by  a  lower  bound  for  a  stability 
(here  coercivity,  more  generally  inf-sup)  constant;  for 
our  simple  model  problem  a  lower  bound  for  the  sta¬ 
bility  constant  is  unity,  but  in  general  this  will  not  be 
the  case. 

It  can  be  readily  demonstrated  that  for  any  p  in  D 
(and  for  any  N,  1  <  N  <  AW),  \\uN(p)  -  uN(p)\\x  < 
A N(p)  and  \sn(p)-sn(p)\  <  A sN(p).  We  can  further  pro¬ 
vide  an  effectivity  result:  for  any  p  in  A)  (and  for  any 
N,  1  <  N  <  N max),  AN(p)/\\uN(p)  -  uN(p)\\x  <  10;  our 
error  bound  is  thus  rigorous  and  “sharp.”  (The  effectiv¬ 
ity,  here  10,  will  more  generally  depend  indirectly  on 
the  parameter  domain  D  and  directly  on  the  coercivity 
and  continuity  constants  associated  with  the  bilinear 
form  a.) 

In  anticipation  of  our  Offline-Online  strategy  we 
must  now  find  a  partition  of  the  computation  of  ||R||x 
into  construction  and  evaluation  components.  Towards 
that  end,  we  insert  the  residual  expression  into  (4)  and 
invoke  the  affine  structure  of  our  bilinear  form,  (1),  and 
our  representation  (3),  to  obtain 

N 

(Rip),  v)x  =  f(v)  -  'Yj  OJNiipWih,  v) 

i=  1 
N 

-  2  pojNiipia2^,,  v),  Vv  e  XN  . 

i=  1 

We  then  apply  linear  superposition  to  express  R(/u)  as 

2N+1 

Rip)  -  2  <D ,(p)§i ,  (5) 

i—  1 

for  <J),:  D  — >  R,  Qi  e  XN ,  1  <  i  <  2N  +  l.1  Note  that 
the  “Riesz’s  pieces”  Q,,  1  <  i  <  N,  are  independent  of 
p.  Finally,  from  (5)  it  follows  that 

A sip)  =  sJ®T(p)A®(p),  A %(p)  =  C(An(p)  ,  (6) 

where  A u  =  (Qi,Qj)x,  1  <  i,  j  <  2N  +  1. 

The  construction-evaluation  partition  is  then  clear: 
in  the  construction  part  we  form  and  store  the  p’ - 
independent  quantity  A  (and  Q )  —  in  OiNN2) 


We  provide  the  explicit  form:  <t>i  =  1.  dvip )  =  a>m  (p), 

®3 (P)  =  WiV2(jU), - <I)A'+2  =  ftWjvlt/r),  ■  •  •  .02V+1  =  pUNNipY 

Vv  s  XN ,  (0uv)x  =  /(v),  (@2,v)x  =  -a\( i,v),  (&3,v)x  = 
-a'fe.v), . . .,  (Qn+ 2,v)x  =  -a2(fi,v), ...,  (Qin+i,v)x  = 
-a2^N,  v). 
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FLOPs;  in  the  evaluation  part  we  calculate  the  error 
bound  from  (6)  —  in  0(4N2)  FLOPs.  (In  general,  the 
stability  constant  cannot  be  deduced  by  inspection  but 
rather  must  be  calculated  by  an  Offline-Online  “Suc¬ 
cessive  Constraint  Method”  (SCM)  [8]:  the  SCM  Of¬ 
fline  stage  is  rather  onerous;  however,  the  SCM  On¬ 
line  cost  depends  only  on  Q  and  is  typically  negligible 
compared  to  the  RB  Online  cost.) 

We  can  encapsulate  the  computational  tasks  asso¬ 
ciated  with  RB  approximation  and  RB  error  estima¬ 
tion  in  the  following  procedures.  Construction  is  rep¬ 
resented  as 

[<Seval]  =  Construction)^}, (7) 

where  <Seval  =  {A^A2^,  FNmax,LNma,  A,Q).  Evalu¬ 
ation  is  represented  as 

[wAf(S),Aw(S),sJV(S),AJf(S),A/*]  = 

Evaluation)^;  A;<Seval),  (8) 

where  H  c  T)  is  either  a  single  value  or  a  set  of  values 
of  the  parameter,  and  //  =  arg  max^gs  (which 

will  serve  in  the  Greedy  Procedure  below). 

We  now  summarize  the  operation  counts  under  the 
general  affine  hypothesis  (2)  and  for  both  linear  ellip¬ 
tic  PDEs  and  LTI  parabolic  PDEs.  In  the  elliptic  case, 
K  -  1;  in  the  LTI  parabolic  case,  K  denotes  the  num¬ 
ber  of  time  levels  (note  that  Evaluation  returns  the 
RB  approximation  and  associated  error  bounds  for  all 
time  levels,  1  <  k  <  K).  The  operation  count  for 
Construction  is  0(NQN2  +  NQ2N2  +  Ny  QN)\  there 
is  no  explicit  dependence  on  K ,  though  certainly  for 
more  complex  temporal  dependence  we  will  require 
larger  N  for  any  given  error  tolerance.  The  storage  re¬ 
quirement  for  <Scval  is  0(QN2  +  Q2N2).  The  operation 
count  for  Evaluation  is  0(QN 2  +  N3  +  KN 2  +  Q2N2)', 
there  is  no  explicit  dependence  on  P,  though  in  gen¬ 
eral  larger  P  (more  parameters)  will  require  larger  N 
for  any  given  error  tolerance. 

2.3.  The  Greedy  Procedure 

To  identify  our  snapshots  and  hence  our  RB  spaces 
XN,  1  <  N  <  /Vmax ,  we  apply  the  Greedy  Procedure 
of  Algorithm  1.  (In  the  case  of  parabolic  PDEs  this 
Greedy  Procedure  must  be  replaced  with  a  POD(f)- 
Greedy(/()  Procedure  [9].)  The  spaces  are  constructed 
sequentially:  at  each  iteration  we  append  the  snapshot 
from  Al^rain  =  {uN(p)\p  e  Htrain)  which  is  least  well 
represented  by  the  current  RB  approximation  space  as 
measured  by  the  RB  field  approximation  error  bound. 
Note  that  in  Line  8  I  is  the  identity  operator  and  I  Lv  is 
the  orthogonal  projection  onto  XN  with  respect  to  the 
( ,  )x  inner  product. 

The  operation  count  for  the  Greedy  Procedure 
(for  elliptic  PDEs)  is  0(APWmax)  +  0(NyQNmax)  + 
0(NQ2Niax)  +  Oin^QNU  +  0(n«™Q2N3m!J  + 


Algorithm  1  Greedy  Algorithm. 

1:  specify  Strain  c  D  of  size  ntrain  and  tolerance  e. 

2:  set  N  =  1,  X\  =  uN(pl)  (jj i  arbitrary  in  D). 

3:  set  4)  =  uN(pi)  (normalized:  ||(fi||x  =  1). 

4:  set  »Seval  =  Construction)*}]). 

5:  set  [..err . / j *]  =  Evaluations113™;  l;Seval). 

6:  while  [err]  >  e  do 

7:  e,v  =  err  (for  “reporting”  purposes); 

8:  £v+i  =  (/  -  n XN)uN(fT‘)  (normalized); 

9:  N<r-N+  1; 

10:  S evai  =  Construction)^;}, -=] . jv)  ( update ); 

11:  set  [„err,.,.,n]  =  Evaluation  (Htrain,  N,  Seval); 

12:  end  while 
13:  set  (Vmax  <-  N. 


(9(ntramA[Jlax).  The  first  term  relates  to  the  calcula¬ 
tion  of  the  snapshots  —  Line  8;  note  y  >  1  relates 
to  the  efficiency  of  the  truth  solution  procedure.  The 
remainder  of  the  terms  relate  to  Line  10  and  Line  1 1 
—  the  operation  counts  associated  with  “integration” 
of  Construction  and  Evaluation  from  N  —  1  to 
N  =  (Vmax.  We  make  two  crucial  observations:  the  up¬ 
date  snapshot  is  determined  by  the  error  bound  and  not 
the  true  error  —  the  truth  solution  is  calculated,  in  Line 
8,  only  for  the  winning  “arg  max”  candidate,  deduced 
in  Line  1 1 ;  as  nlrain  — »  oo,  the  cost  of  the  “many  query” 
calculation  A^(Htrain)  is  determined  solely  by  the  inex¬ 
pensive  Evaluation  —  the  expensive  Construction 
is  asymptotically  negligible.  This  strategy  ensures 
0(ntram)  +  O(N)  rather  than  0(nttamN)  complexity  and 
thus  permits  consideration  of  very  large  train  sets  E1™" 
for  which  Al^r!im  will  indeed  be  a  good  surrogate  for 
the  actual  manifold  of  interest  =  {uN(fi)  |/r  e  Dj. 

In  actual  practice,  the  Greedy  Procedure  can  identify 
very  effective  RB  spaces.  Large  train  samples  —  en¬ 
abled  by  our  error  bound  “importance  sampling”  strat¬ 
egy  and  Construction-Evaluation  many-query 
procedure  —  are  of  course  imperative  in  particular  for 
larger  P. 

2.4.  Offline-Online  Decomposition 

The  Offline-Online  Decomposition  is  now  readily 
identified.  In  the  process,  we  also  now  include  the  vi¬ 
sualization  aspect  of  our  approach. 

The  Offline  stage  is  simply  the  Greedy  Proce¬ 
dure.  The  Greedy  Procedure  (through  Construction) 
yields  Seval. 

The  Online  Dataset  is  then  {tSeval,  <SV1S},  where 
<SV1S  =  fj(xi),  i  e  'V,  1  <  j  <  N.  The  Online  Dataset, 
for  either  linear  elliptic  PDEs  or  LTI  parabolic  PDEs, 
is  of  size  0(QN 2  +  Q2N 2)  +  0(YV\N),  where  \"V\  de¬ 
notes  the  cardinality  of  the  vertex  set  associated  with 
the  desired  visualization  region  'R.  The  first  contri¬ 
bution  is  due  to  <SL'val  and  is  independent  of  N.  The 
second  contribution  is  due  to  Svm  and  of  course  is  not 
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independent  of  JV;  however,  in  practice  we  envision  se¬ 
lective  rendering  and  hence  |'V|  «:  N  —  for  example 
|'V|  «  cN2^,c  «  1,  for  H  a  small  two-dimensional 
manifold  within  a  three-dimensional  domain  £1 
The  Online  stage  comprises  certified  output  eval¬ 
uation  and  visualization  for  any  given  query  in¬ 
stance  //  e  T).  Certified  output  evaluation  — 
calculation  of  the  RB  output  prediction  and  as¬ 
sociated  RB  output  a  posteriori  error  bound  — 
is  effected  by  [o}N(p'),AN(p'),sN(p'),AsN(p'),.]  = 

Evaluation)//; N\Seml);  as  described  in  detail  ear¬ 
lier,  the  operation  count  is  independent  of  N.  Certified 
visualization  is  then  effected  by 

[figure,  Apixip')]  =  Visualization)^)//),  >SV1S), 


uN{xi\n')  =  i  e  *V, 

i=i 

and  A is  an  L2(Q.)  error  bound  for  m,v(/0|«.  (Un¬ 
der  suitable  hypotheses  on  R  we  may  form  A,v  j  (//)  = 
C|Ay(//),  where  Cr  may  be  calculated  Offline.)  The 
operation  count  for  Visualization  is  0(\*V\KN)  (for 
all  time  levels  K  in  the  LTI  parabolic  case). 

3.  Hierarchical  Architecture 

We  have  previously  identified  the  pre-deployment 
period  with  the  Offline  stage  and  the  deployment  pe¬ 
riod  with  the  Online  stage.  Now  we  further  identify 
the  Offline  stage  with  a  parallel  supercomputer  (as  war¬ 
ranted)  and  the  Online  stage  with  a  “thin”  platform:  the 
former  has  the  necessary  speed  and  memory  to  address 
the  large  truth  calculations  required;  the  latter  —  with 
minimal  processing  power  and  memory  capacity  —  is 
of  the  reduced  physical  size  and  weight,  and  cost,  to 
permit  service  (often  at  many  “sites”  or  “installations”) 
in  the  field. 

The  Offline  stage  admits  very  efficient  treatment 
on  a  massively  parallel  machine  [10],  There  are 
two  different  (heterogeneous)  opportunities  for  data- 
parallelism.  First,  in  Line  8  of  the  Greedy  Procedure, 
we  can  exploit  classical  parallelism  in  space  for  cal¬ 
culation  of  the  truth  solution:  the  domain  £1  is  broken 
into  ;;pmc  smaller  subdomains  (and  associated  sets  of  fi¬ 
nite  element  degrees  of  freedom)  which  are  in  turn  dis¬ 
tributed  to  «proc  processors.  Second,  in  Line  1 1  of  the 
Greedy  Procedure,  we  can  exploit  parallelism  in  pa¬ 
rameter  for  calculation  of  the  “arg  max”:  the  “domain” 
Htram  is  broken  into  nproc  partitions  (and  associated  sets 
of  parameter  values)  which  are  in  turn  distributed  to 
nproc  processors.  In  the  first  case,  the  calculations  on 
different  processors  are  of  course  coupled  and  hence 
communication  and  granularity  will  impact  parallel  ef¬ 
ficiency;  in  the  second  case,  the  calculations  on  differ¬ 
ent  processors  are  independent  —  (copies  of)  the  same 


RB  model  executing  different  data  —  and  thus,  as  in 
Monte-Carlo  simulations,  the  parallel  efficiency  will 
be  very  high.  These  algorithms  are  implemented  [10] 
on  (a  subset  of)  the  Ranger  supercomputer,  which  has 
in  total  62,976  cores,  123TB  of  memory,  and  a  theo¬ 
retical  peak  performance  of  579  TFLOPs. 

The  Online  stage  can  be  hosted  on  many  different 
small  and  inexpensive  “deployed”  platforms  from  lap¬ 
tops  and  tablets  to  smartphones  and  embedded  chips. 
The  “App”  software  is  essentially  the  Evaluation 
code  and  the  Visualization  code.  The  Online 
Dataset  for  any  particular  problem  —  required  by 
Evaluation  and  Visualization  —  can  either  be 
stored  on  the  deployed  platform  or  downloaded  over 
the  Internet:  the  former  would  be  relevant  for  “in 
the  loop”  embedded  applications  such  as  detection 
or  control;  the  latter  might  be  relevant  to  more  gen¬ 
eral  but  still  “interactive”  environments  such  as  ed¬ 
ucation  or  design.  In  actual  practice  more  difficult 
problems  (larger  N,  Q )  and  in  particular  more  exten¬ 
sive  visualization  (larger  |'V|)  may  not  be  feasible  on 
very  thin  deployed  platforms  since  the  Online  Dataset 
will  be  too  large.  At  present  the  “App”  is  imple¬ 
mented  in  the  Java  programming  language  and  exe¬ 
cuted  on  the  Nexus  One  Android-based  smartphone  — 
a  “model”  deployed  platform  —  with  a  1GHz  proces¬ 
sor  and  512MB  of  memory  with  double-precision  ac¬ 
curacy;  the  OpenGL  ES  library  is  invoked  for  graphical 
renderings.  Note  that  the  App  has  access  to  a  limited 
subset  of  the  total  memory  which  certainly  restricts  the 
range  of  problems  and  parameters  that  may  be  consid¬ 
ered. 

In  this  context  we  briefly  describe  an  enhancement 
to  our  current  approach  which  simultaneously  can 
extend  both  Offline  parallel  performance  and  Online 
deployed  capability:  “hp”  reduced  basis  approaches 
[11],  In  the  hp  approach,  in  a  new  Offline  “h”  pre¬ 
preprocessing  step  we  (optimally)  decompose  the  pa¬ 
rameter  domain  £D  into  smaller  parameter  subdomains; 
we  then  pursue  the  standard  (in  fact,  “p”)  certified 
reduced  basis  approach  in  each  subdomain  —  Of¬ 
fline  Greedy  Procedure,  and  Online  Evaluation  and 
Visualization.  The  hp  approach  offers  new  op¬ 
portunities  for  Offline  (embarrassing)  parallelism:  the 
standard  “p”  certified  reduced  basis  procedures  can  be 
pursued  in  parallel  on  each  parameter  subdomain.  And 
even  more  importantly,  the  hp  approach  provides  a 
mechanism  for  consideration  of  larger  problems  on  de¬ 
ployed  platforms:  we  may  download  only  the  data  as¬ 
sociated  with  a  particular  parameter  subdomain  or  sub- 
domains,  and  implement  “parameter  caching”  notions 
to  anticipate  future  parameter  (subdomain)  requests. 

Finally,  we  note  that  although  we  implement  the 
Offline  and  Online  stages  on  vastly  different  ma¬ 
chines  with  very  different  capabilities,  the  end  result 
is  in  some  sense  machine-agnostic:  the  smartphone 
(low-order  reduced  basis)  prediction  is,  to  within 
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±A sn(jj),  indistinguishable  from  the  parallel  supercom¬ 
puter  (high-fidelity  finite  element)  calculation.  We  can 
thus  say  that  we  effectively  achieve  “supercomputing 
on  a  phone.”  Absent  rigorous  error  bounds  no  such 
equivalence  can  be  claimed. 


4.  Numerical  Examples 


We  first  consider  the  Helmholtz  equation  and  in  par¬ 
ticular  the  planar  acoustics  horn  problem  described  in 
detail  in  [12].  The  full  domain  Q°  is  depicted  in  Fig¬ 
ure  1(a)  and  comprises  both  the  horn  proper  as  well 
as  a  large  circular  segment  on  which  second-order  ra¬ 
diation  conditions  [13]  are  applied.  The  (symmetric 
about  y  =  0)  horn  domain  Cl°  is  defined  by  the  location 
of  the  top  wall  yhomW,0  <  x  <  10:  j’hom (x)  =1/2  for 
0  <  x  <  5;  yhomW  =  (l/5)(7.5  -  x )  +  (2q/5)(x -  5)  for 
5  <  x  <  7.5;  >-hom(x)  =  (2t7/5)(10-x)  +  6/5(x-7.5)  for 
7.5  <  x  <  10.0;  here  rj  is  a  parameter  that  defines  the 
problem  geometry.  Note  we  state  the  problem  in  non- 
dimensional  form:  all  lengths  are  scaled  by  the  “inlet” 
channel  width  vv;  all  pressures  are  scaled  by  a  nomi¬ 
nal  input  pressure  p°Q  (note  we  replace  our  independent 
variable  u°  with  p°  to  avoid  confusion). 

The  governing  equations  for  the  (complex)  pressure 
are  then 

-V2p0  -  k2p°  =  0,  in  Q°, 

where  k  =  loH/c  for  to  the  angular  frequency  and 
c  the  speed  of  sound.  We  impose  a  right-traveling 
wave  condition  at  the  inlet  F^,  ikp°  +  dp° /dn  =  2 ik, 
homogeneous  Neumann  conditions  on  the  horn  walls, 
dp° /dn  =  0,  and  second  order  radiation  conditions  on 
the  circular  farfield  boundary.  (Here  n  denotes  outward 
normal.)  We  choose  two  parameters,  p\  =  r/,p2  =  K 
and  specify  the  parameter  domain  D  —  [1.7, 1.8]  x 
[0,2],  (A  larger  parameter  domain  is  readily  accom¬ 
modated  by  the  reduced  basis  approach  —  but  not  by 
our  current  deployed  platform.)  The  output  of  interest 
is  the  modulus  of  the  reflection  coefficient, 


sip)  = 


p°dy  -  1 


note  that  1  —  s  can  be  related  to  the  fraction  of  the  wave 
energy  transmitted  to  the  surroundings  [14],  Finally, 
for  our  “truth”  we  take  a  second  order  FE  approxima¬ 
tion  with  14, 935  degrees  of  freedom.  The  horn  prob¬ 
lem  serves  to  demonstrate  the  need  for  high-fidelity 
models  and  accurate  numerical  calculations:  only  (ac¬ 
curate  solutions  to)  the  PDE  can  reproduce  the  detailed 
acoustic  response. 

This  horn  problem  stretches  the  basic  formulation 
of  Section  2  in  several  ways.  First,  the  field  variable 
is  now  complex.  Second,  the  problem  is  elliptic  but 
not  coercive,  which  in  particular  complicates  the  cal¬ 
culation  of  the  (inf-sup)  stability  constant  required  for 


our  rigorous  error  bounds  [15].2  Third,  the  output  can 
be  derived  from  a  linear  functional  but  in  fact  is  not  a 
simple  linear  functional  of  the  field.  And  fourth,  the 
problem  is  posed  on  a  parameter-dependent  domain 
Q.°(rj)'.  upon  piecewise-affine  mapping  to  the  reference 
domain  O  =  Q"(//  =  1.75)  we  recover  the  necessary 
form  (2)  with  Q  =  11  terms.  Note  that  visualization  is 
in  general  performed  with  respect  to  the  actual  (possi¬ 
bly  parameter-dependent)  domain  Cln(p). 

In  this  two-dimensional  case  the  Offline  calculation 
does  not  require  a  large  machine  and  hence  the  Of¬ 
fline  stage  is  performed  on  a  standard  desktop.  (Ob¬ 
viously  three-dimensional  versions  of  this  same  prob¬ 
lem  in  particular  for  higher  k  would  be  demanding 
given  the  oscillatory  structure  of  the  truth  solution.) 
We  shall  thus  focus  on  the  Online  results  as  provided 
by  the  Nexus  One  implementation;  in  all  cases  we  con¬ 
sider  N  -  A/nax  =  53.  We  show  in  Figure  1(a)  the 
Visualization  of  the  real  part  of  the  pressure  for 
r\  =  1.5  and  k  -  1.75.  Note  that  for  two-dimensional 
problems  we  often  can  and  hence  do  render  the  entire 
field  (in  which  case  |*V|  =  3832:  here  we  interpolate 
the  field  onto  a  first  order  mesh  to  reduce  Y'V\  )  even  on 
relatively  thin  platforms.  We  next  present  (though  in 
practice  compute  first)  in  Figure  1(b)  the  Evaluation 
of  our  output  for  S  given  by  (?/  =  1.5,&,-)/=i,...,4o: 
the  middle  curve  is  sn(E)  while  the  bottom  and  top 
curves  represent  sn(E)  -  AT(H)  and  s#(E)  +  AT(E),  re¬ 
spectively.  We  emphasize  the  importance  of  the  error 
bounds  in  ensuring  not  just  rapid  response  but  also  reli¬ 
able  response:  the  truth  output  must  reside  between  the 
bottom  and  top  curves.  Finally,  we  note  that  both  Fig¬ 
ure  1(a)  and  Figure  1(b)  are  direct  “screen  grabs”  from 
our  Nexus  One  implementation.  Also  we  note  that  in 
this  example  (and  the  subsequent  examples  below)  the 
Online  computation  time  on  the  phone  only  takes  one 
or  two  seconds. 

We  next  consider  the  heat  equation  —  transient  ther¬ 
mal  conduction  —  in  the  complex  three-dimensional 
“Swiss  Cheese”  configuration.  The  domain  Q  is  given 
by  the  unit  cube  [0,  l]3  \  S,  where  S  is  a  lattice  of  27 
spheres  of  radius  1/5  and  respective  centers  [0, 1/2,  l]x 
[0, 1/2, 1],  (Note  we  directly  consider  the  nondimen- 
sional  version  of  the  problem  based  on  the  standard 
conduction  scalings.)  For  future  reference  we  describe 

Q  =  LE=1Q;-, 


where  £lj,  1  <  j  <  8,  is  the  intersection  of 

Q  with  the  8  octants  given  by  [0, 1/2]3,  [1/2, 1]  X 

[0, 1/2]2, . . . ,  [1/2, 

The  governing  equations  are  most  easily  described 


2On  the  phone  we  resort  to  a  shortcut:  the  inf-sup  constant  for 
the  Online  stage  is  replaced  by  the  minimum  inf-sup  lower  bound 
over  a  dense  sample  in  !D  calculated  in  the  Offline  stage. 
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in  weak  form:  find  u(t\  k)  for  0  <  t  <  tf  =  1  such  that 
f  —  v+V  Kj  f  Vm-Vv+  f  VwVv  =  f  v,Vv€X, 

Jn  dt  fa  1  JQj  Jns  Jrb 

(10) 

where  Kj  is  the  thermal  conductivity  of  region  0;  rel¬ 
ative  to  the  thermal  conductivity  of  region  Ox  (note 
we  assume  uniform  specific  heats  and  hence  the  Kj 
may  also  be  interpreted  as  scaled  thermal  diffusivities), 
X  =  {v  e  //'(£!)  |  t'r,  =  0),  Tb  is  the  bottom  bound¬ 
ary  “z  =  0”  (on  which  we  impose  inhomogeneous 
flux  boundary  conditions),  and  Ft  is  the  top  boundary 
“z  =  1”  (on  which  we  impose  zero  Dirichlet  condi¬ 
tions).  Note  that  we  impose  homogeneous  natural,  in 
this  case  zero  flux,  conditions  on  all  other  boundaries. 
We  choose  P  =  7  parameters,  /ij  =  kj,  1  <  j  <  7, 
and  specify  the  parameter  domain  D  =  [0.5, 2]7.  We 
consider  several  outputs  corresponding  to  averages  of 
the  temperature  over  small  cubes;  for  example,  the  first 
output  is  the  average  temperature  over  [0, 1/4]3.  Fi¬ 
nally,  we  select  a  backward  Euler  temporal  discretiza¬ 
tion  with  K  =  100  time  levels  and  a  Pi  truth  finite  ele¬ 
ment  approximation  space  with  N  =  241, 520  degrees 
of  freedom.  This  example  is  intended  to  illustrate  the 
extension  of  the  reduced  basis  formulation  to  parabolic 
PDEs  but  also  the  consideration  of  three-dimensional 
spatial  domains  and  the  treatment  of  many  parameters; 
the  latter  increase  the  burden  of  both  the  Offline  and 
Online  stages. 

The  RB  treatment  of  parabolic  PDEs  parallels  quite 
closely  the  RB  treatment  of  elliptic  PDEs  presented  in 
Section  2.  We  elaborate  briefly  here  on  the  features 
particular  to  RB  treatment  of  LTI  parabolic  PDEs:  The 
affine  hypothesis  on  bilinear  (and  linear)  forms  re¬ 
mains  in  force;  we  deduce  from  inspection  of  the  weak 


Figure  1 :  (a)  Visualization  on  of  the  real  part  of  the  pressure 

lor  //  =  (1.75,2.0).  (b)  Modulus  of  the  reflection  coefficient  for 

(//  =  1 .75.  kj  £  [0. 2 1  );=  i _ 5o .  We  show  the  RB  outputs  (marked 

with  X  symbols)  and  corresponding  upper  and  lower  bounds  —  here 
the  bounds  are  sufficiently  tight  that  they  cannot  be  distinguished 
from  the  RB  output. 


form  (10)  that  for  our  example  we  require  one  proto 
mass  matrix  and  Q  =  8  proto  stiffness  matrices.  The 
RB  Galerkin  approximation  requires  little  modification 
from  the  elliptic  case  in  particular  since  the  RB  tempo¬ 
ral  discretization  is  inherited  from  the  truth.  The  RB 
error  bounds  are,  as  in  the  elliptic  case,  derived  from 
classical  stability  arguments,  however  now  more  care 
must  be  exercised  in  definition  of  the  proper  (spatio- 
temporal)  norms;  the  latter,  in  turn,  affects  the  class  of 
output  functionals  that  we  may  consider  —  for  exam¬ 
ple,  we  can  provide  pointwise  bounds  in  time  only  for 
Lr(Ll)  output  functionals.  The  RB  spaces  are  now  ob¬ 
tained  by  a  P O D ( f  )-G reed y (ju)  Procedure  [9]:  for  each 
/j*  identified  by  the  Greedy(yu),  a  single  POD  mode  — 
associated  with  the  error  in  the  RB  approximation  — 
is  appended  to  the  RB  space.  (Note  also  that  we  may 
“train”  on  an  impulse  function  in  order  to  treat  with 
a  single  RB  approximation  any  control  function  speci¬ 
fied  in  the  Online  (deployed)  stage.)  Finally,  the  opera¬ 
tion  counts  for  LTI  parabolic  PDEs  will  typically  scale 
as  less  than  K  times  the  elliptic  effort,  as  summarized 
in  the  estimates  provided  in  Section  2. 

In  this  example  the  Offline  burden  is  considerable 
and  the  parallel  implementation  (here,  on  Ranger  and 
based  on  the  libMesh  [16]  library)  important  both  as 
regards  the  computation  of  the  truth  time  series  (from 
which  we  extract  our  POD  mode)  —  parallelized  over 
Q  —  and  the  “arg  max”  over  the  extensive  Zlr;lm  sam¬ 
ple  required  by  the  many  parameters  (large  P)  —  par¬ 
allelized  over  Zlraln.  We  present  in  Figure  2  the  POD- 
Greedy  convergence,  as  measured  by  e^,  for  a  train 
sample  of  size  100,000;  computational  (wall-clock) 
time  on  512  cores  is  3.5  hours.  (Note  that  in  this  con¬ 
text  we  may  interpret  e,y  as  a  bound  for  the  maximum 
over  /j  in  Ettam  of  the  maximum  over  k  =  1 , AT,  of 
\\uN(tk; / j)-uN{tk ; yu)||/,2(Q).)  We  now  proceed  to  the  On¬ 
line  stage:  we  present  in  Figure  3  the  Visualization 
for  n'  =  (2,2,0.5,2,1.31,1.745,1.85)  at  the  final  time 
level  k  =  K  as  a  screen  grab  from  the  Nexus  One 
implementation.  Note  that  for  this  three-dimensional 
configuration  we  can  only  afford  (given  phone  memory 
limitations)  to  render  the  field  variable  over  a  “small” 
two-dimensional  manifold:  the  outer  surface  of  the  do¬ 
main.3 

Finally,  we  consider  the  time-dependent  incom¬ 
pressible  Navier-Stokes  equations  at  relatively  low 
Reynolds  number  —  here  we  employ  methodology 
from  [17],  We  consider  pressure-driven  flow  through 
a  duct  ( x,y,z )  €  [0,3]  x  [0,  l]2  which  contains  a  sta¬ 
tionary  solid  sphere  S  of  radius  0.1  with  center  (x  = 
1/2, y  —  1/2,  z  =  1  /2);  our  domain  Q.  is  thus  given  by 
[0, 3]  x  [0,  l]2  \  S .  (In  fact,  we  impose  periodic  bound¬ 
ary  conditions  in  x,  and  hence  we  implicitly  consider 


3In  fact,  we  choose  visualization  vertices  Ty  corresponding  to  a 
uniformly  coarsened  version  of  the  outer  surface  of  the  truth  mesh  in 
order  to  reduce  the  amount  of  data  loaded  onto  the  phone. 
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an  infinitely  long  duct  with  a  periodic  array  of  spher¬ 
ical  “eddy  promoters.”)  We  shall  directly  consider  the 
problem  in  nondimensional  form;  we  choose  a  diffu¬ 
sive  scaling  in  which  all  lengths  are  measured  in  units 
of  the  duct  width  (=  height)  H  and  velocities  are  mea¬ 
sured  in  units  of  v/H,  where  v  is  the  kinematic  viscos¬ 
ity. 

In  weak  from  we  wish  to  find  the  velocity  u(t\  If)  = 
(ux,  uy,  uz)  e  X  for  0  <  t  <  tf  =  0.25  such  that 

LI-v+2i2(vV-(““)+vM-VM)  = 

fnnvx  -  fnVu  ■  Vv,Vy  e  X,  (11) 

where  X  is  the  subspace  of  Hl( Q)pel.  of  functions  which 
are  divergence-free  and  which  vanish  (in  all  compo¬ 
nents)  on  all  solid  walls  (note  the  per  refers  to  3- 
periodicity  in  x).  Here  n  is  a  nondimensional  negative 
pressure  gradient  in  the  x-direction;  the  more  usual 
Reynolds  number  based  on  average  velocity  will  scale 
roughly  as  VlT.  We  choose  P  =  1  parameter,  /j  —  n, 
and  specify  the  parameter  domain  D  =  [100,700]. 
We  consider  two  outputs  corresponding  to  averages  of 
the  streamwise  (x)  component  of  the  velocity,  ux,  over 
small  cubes  with  side-length  of  0. 1  and  with  centers  at 
(x  =  0.3, y  =  1/2,  z  =  1/2)  and  (x  =  0.7, y  =  1/2,  z  = 


Figure  2:  Convergence  of  the  POD-Greedy  algorithm  for  the  “Swiss 
Cheese”  problem. 


HA  Saue  18:S6 


Figure  3:  Visualization  of  the  “Swiss  Cheese”  solution  field  at  the 
final  time  for//  =  (2, 2, 0.5, 2, 1.31, 1.745, 1.85). 


1  /2)  respectively:  the  outputs  can  be  interpreted  as  lo¬ 
cal  Reynolds  numbers;  the  difference  in  these  fore-aft 
outputs  can  be  interpreted  as  measure  of  the  strength 
of  the  nonlinear  (inertial)  effects.  Finally,  we  select  a 
Crank-Nicolson  temporal  discretization  with  K  =  100 
time  levels  and  a  truth  finite  element  approximation 
space  with  N  =  623,632  degrees  of  freedom4.  This 
example  is  intended  to  illustrate  the  extension  of  the 
RB  formulation  to  (quadratic)  nonlinearities. 

This  treatment  of  nonlinear  parabolic  PDEs  builds 
directly  on  the  LTI  parabolic  foundation.  There  are, 
however,  significant  complications  in  particular  related 
to  the  error  bounds.  First,  the  stability  constants  will  be 
significantly  negative  for  higher  Reynolds  number  — 
indicating  exponential  growth  of  the  error  bounds  in 
time.  (In  some  cases  this  exponential  growth  is  “real” 
and  inescapable;  in  some  cases  this  exponential  growth 
is  an  artifice  of  our  energy  arguments  —  and  hence 
could  be  mitigated.)  We  may  thus  not  consider  either 
larger  Reynolds  numbers  or  larger  final  times  tf5.  Sec¬ 
ond,  these  stability  constants,  albeit  pessimistic,  will 
be  much  more  difficult  to  compute  (via  the  SCM  pro¬ 
cedure)  due  to  the  non-LTI  nature  of  the  problem.  And 
third,  the  dual  norm  of  the  residual  is  much  more  diffi¬ 
cult  to  compute;  as  a  result,  the  storage  for  the  Online 
Dataset  and  the  operation  count  for  Evaluation  will 
now  scale  as  Q2N 4  and  not  Q2N2.  (The  hp  approach 
can  effectively  moderate  this  effect  since  the  division 
into  parameter  subdomains  reduces  N .) 

In  this  case  the  truth  calculation  is  expensive:  each 
evaluation  /.i  — >  uN(tk; /.<),  0  <  k  <  K,  requires  57  wall- 
clock  minutes  on  Ranger  with  nproc  =  256.  The  Offline 
effort  on  Ranger  requires  just  slightly  over  12  wall- 
clock  hours  to  complete  the  POD-Greedy  for  ;Vmax  = 
12.  In  this  case,  most  of  the  Offline  effort  is  associated 
with  the  truth  snapshots;  the  “argmax”  is  relatively  in¬ 
expensive  here  since  /Vrnax  is  small  and  also  Htrain  is 
quite  small. 

We  now  proceed  to  the  Online  stage:  we  present  in 
Figures  4a  and  4b  the  two  outputs  as  a  function  of  time 
for  fj'  =  100  and  //  =  700,  respectively,  obtained 
with  N  —  12  basis  functions.  These  plots  are  screen 
grabs  from  our  Nexus  One  implementation  —  in  each 
case  the  output  data  is  generated  in  roughly  2  seconds. 
We  observe  that  for  //  =  700  the  Reynolds  based 
on  maximum  velocity  (at  the  final  time)  is  roughly 
30,  indicating  significant  nonlinear  effects;  the  con¬ 
siderable  fore-aft  asymmetry  at  //  =  700  is  further 


4Note  the  latter  is  in  fact  based  on  a  Taylor-Hood  discretization 
in  which  the  pressure  as  Lagrange  multiplier  is  introduced  in  the 
usual  fashion  to  impose  the  divergence-free  condition.  We  choose 
to  write  the  weak  form  (11)  in  div-free  form  since  this  is  our  point 
of  departure  for  the  RB  approximation  —  in  which  all  snapshots  are 
incompressible. 

5  Note  for  our  example  here  in  fact  the  stability  constants  are 
positive:  all  disturbances  monotically  decay.  Examples  at  higher 
Reynolds  number  for  which  the  stability  constants  are  negative  are 
presented  in  [17], 
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indication  of  significant  deviation  from  Stokes  flow. 
Again,  for  each  of  the  two  outputs  we  present  both 
SN(tk;/u'),0  <  k  <  K,  and  also  lower  and  upper 
bounds  sn (tk‘,  n')  ±  AsN(tk;/j'),0  <  k  <  K\  the  truth 
sN{tk\n'),  0  <  k  <  K,  must  reside  within  the  bounds 
provided.  We  also  show  plots  of  the  ^-component  of 
the  velocity  field  from  the  Nexus  One  implementation 
in  Figure  5. 6  In  short,  we  have  addressed  Hilbert’s 
25th  Problem:  “Solve  the  Navier-Stokes  equations  on 
a  phone.” 

Finally,  we  note  that  although  the  emphasis  in  this 
paper  is  on  real-time  deployed  response,  in  fact  our 
approach  is  also  appropriate  for  many-query  studies 


6We  again  choose  visualization  vertices  T,f  corresponding  to  a 
Pi  uniformly  coarsened  version  of  the  mesh. 
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Figure  4:  Two  RB  output  plot  screenshots  from  the  Nexus  One 
implementation  for  the  Navier-Stokes  problem:  (a)  //  =  100,  (b) 
//  =  700.  The  lines  marked  with  x  symbols  are  the  RB  outputs  as 
functions  of  time,  and  the  corresponding  RB  output  bounds  are  plot¬ 
ted  as  solid  lines.  Note  the  change  in  vertical  scale  between  (a)  and 
(b). 


(a) 
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(b) 

Figure  5:  Two  plots  of  the  x-component  of  the  Navier-Stokes  ve¬ 
locity  field  from  the  Nexus  One  implementation:  (a)  //  =  100,  (b) 
n'  =  700.  Here  we  have  taken  an  “L-shaped”  manifold  as  the  vi¬ 
sualization  surface  in  order  to  show  the  velocity  field  in  the  domain 
interior. 


(and  hence  particularly  appropriate  for  real-time  many- 
query  applications  such  as  parameter  estimation).  In 
our  Navier-Stokes  example  we  observe  that  the  break¬ 
even  point  is  12  queries:  after  12  queries  the  reduced 
basis  approach  is  much  less  expensive  than  the  classi¬ 
cal  finite  element  approach  even  if  we  include  in  the 
reduced  basis  budget  the  considerable  Offline  costs. 
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